Predicting Statistical Properties of Open Reading Frames in Bacterial Genomes
نویسندگان
چکیده
An analytical model based on the statistical properties of Open Reading Frames (ORFs) of eubacterial genomes such as codon composition and sequence length of all reading frames was developed. This new model predicts the average length, maximum length as well as the length distribution of the ORFs of 70 species with GC contents varying between 21% and 74%. Furthermore, the number of annotated genes is predicted with high accordance. However, the ORF length distribution in the five alternative reading frames shows interesting deviations from the predicted distribution. In particular, long ORFs appear more often than expected statistically. The unexpected depletion of stop codons in these alternative open reading frames cannot completely be explained by a biased codon usage in the +1 frame. While it is unknown if the stop codon depletion has a biological function, it could be due to a protein coding capacity of alternative ORFs exerting a selection pressure which prevents the fixation of stop codon mutations. The comparison of the analytical model with bacterial genomes, therefore, leads to a hypothesis suggesting novel gene candidates which can now be investigated in subsequent wet lab experiments.
منابع مشابه
Distinguishing the ORFs from the ELFs: short bacterial genes and the annotation of genomes.
A substantial fraction of hypothetical open reading frames (ORFs) in completely sequenced bacterial genomes are short, suggesting that many are not genes but random stretches of DNA. Although it is not feasible to authenticate the coding capacity of all such regions experimentally, comparisons of ORFs in related genomes can expose those that encode functional proteins.
متن کاملCompilation and analysis of group II intron insertions in bacterial genomes: evidence for retroelement behavior.
Group II introns are novel genetic elements that have properties of both catalytic RNAs and retroelements. Initially identified in organellar genomes of plants and lower eukaryotes, group II introns are now being discovered in increasing numbers in bacterial genomes. Few of the newly sequenced bacterial introns are correctly identified or annotated by those who sequenced them. Here we have comp...
متن کاملThe Random Nature of Genome Architecture: Predicting Open Reading Frame Distributions
BACKGROUND A better understanding of the size and abundance of open reading frames (ORFS) in whole genomes may shed light on the factors that control genome complexity. Here we examine the statistical distributions of open reading frames (i.e. distribution of start and stop codons) in the fully sequenced genomes of 297 prokaryotes, and 14 eukaryotes. METHODOLOGY/PRINCIPAL FINDINGS By fitting ...
متن کاملStatistical Properties of Open Reading Frames in Complete Genome Sequences
Some statistical properties of open reading frames in all currently available complete genome sequences are analyzed (seventeen prokatyotic genomes, and 16 chromosome sequences from the yeast genome). The size distribution of open reading frames is characterized by various techniques, such as quantile tables, QQ-plots, rank-size plots (Zipf's plots), and spatial densities. The issue of the infl...
متن کاملEstimation of the Number of Orphan Genes in the Genome Sequences
In annotations of genome sequences a considerable fraction of putative proteins are left without sequence similarity to known proteins. These genes or open reading frames (ORFs) have been referred to “orphan”. Some portions of these putative proteins may have crucial organism-specific functions. On the contrary, it has been reported that some of the annotated genes in sequenced bacterial genome...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 7 شماره
صفحات -
تاریخ انتشار 2012